2 - 30.2 Grammars and Syntactic Processing [ID:35581]
50 von 405 angezeigt

Oh, and welcome to today's video nugget on grammars for natural language processing.

I'm sure that at least the computer scientists among you know things about grammars because

grammars were built for handling formal languages.

They're tools for describing formal languages succinctly.

Now I would like to basically see whether grammar or other symbolic methods might actually

be good for our problem of natural language processing.

Remembering, of course, for instance, when you learned English in school or something

like this, one of the things you were told were grammar things like in English we do

subject predicate object and you use the progressive past for anything blah blah blah and so on.

So grammars are on one hand things we use for natural language and for formal languages

and we kind of reconcile this and see how we can use this in language based AI.

So I would like to motivate this from the point of view of language models.

We remember that while character based language models work well, word based language models

have a problem because we don't have enough data to build these huge models.

We've seen a couple of things like the unknown words and out of vocabulary words and so on.

But what we would like to do here is essentially use symbolic or symbolic statistical methods

and the general idea is that we cluster words into what we call syntactic classes and rather

than having a model for acceptable sequences is what a language model for words is, really

what we try to write down or get a model for are acceptable word class sequences and these

kind of word class sequence description languages or word models we call phrase structure grammars.

The advantage of this is since we cluster words into classes first, then we actually

can get by with much less information.

So we can do a language model for say the German language by something like 10 to the

power of three structural rules over a lexicon of a hundred thousand words and if these generate

most German acceptable sentences.

And this generative capacity of grammars is something we're interested in.

It gives us relatively good generalizability of the models we have and it condenses the

information.

We can get by with 10 to the 5 facts, if you will, rather than the estimated 10 to the

15 facts, which we estimated we need for a German word trigram model.

So that's what we're after here in this section.

Many animals, lower animals, below kind of the primates and certain birds actually use

isolated symbols for sentences and they kind of have can communicate propositions like

marmosets can identify certain threats, usually from the air by different signals, but they

don't have a sentence structures.

Grammar is a great for information sparsity but they have a disadvantage of course that

they over or an under generalized as any symbolic model does.

And we're building on work by Noam Chomsky who use grammars for for language first.

Okay, so let's go into the theory.

I'm assuming many of you have seen this before.

So we define a phrase structure grammar to be a quadruple that has a set of a finite

set of non terminal signals as symbols.

Here in a phrase structure grammars we also call them syntactic categories.

Here we have the syntactic and this little grammar we have a syntactic category for sentences

for noun phrases for articles for nouns and for intransitive verbs.

And then we have a set of finite set actually of production rules which basically are rewrite

rules.

The head is written rewritten to the body, where the head is essentially made up in a

certain way of a as a sequence of terminal and non terminal symbols.

Okay, I forgot the non terminal signals, the terminal signals with the symbols, which is

Teil eines Kapitels:
Chapter 30. Natural Language for Communication

Zugänglich über

Offener Zugang

Dauer

00:45:55 Min

Aufnahmedatum

2021-07-09

Hochgeladen am

2021-07-09 11:17:05

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen